 Ukraine Local-level Election and Language Data (2001-2019)

## Overview  
This project constructs a hybrid local-level panel dataset of election returns (2006-2019) and native language composition in Ukraine (2001), focused on a **cluster** unit (a merged unit of precinct(s) + settlement(s)). It allows for structured analysis of Ukraine’s smaller linguistic minorities (Bulgarian, Hungarian, Romanian/Moldovan) with an emphasis on electoral behavior in response to state language policy during the 2010s. The regions studied are Zakarpattia, Chernivtsi and Odesa (fmr. Izmail region).

A *cluster* is defined as the smallest geographic unit in which both election and census data can be aligned. In many cases, a cluster corresponds to a single village and precinct, but in others it groups multiple precincts within one settlement or multiple villages within one precinct.

For example, in the fmr. Izmail region of Odesa Oblast:  
- 284 settlements  
- 370 precincts  
- 234 clusters  
Linguistic composition (2001) in that region: ~35% Russian, ~32% Ukrainian, ~19% Bulgarian, ~10% Moldovan, ~3% Gagauz.  

---

## Contents

- **Data/** – Election returns, precinct equivalence files for each individual electoral district, geodata containing cluster shapes with language and election data (used to generate appended maps). 
- **Documentation/** – Codebook with lists of election variables along with Ukrainian and English descriptions, appendix with maps and robustness check, and source list.
- **README.md** – This file  

---

## Data Sources 

1. **Central Election Commission of Ukraine (CVK)**  
   Election returns for parliamentary and presidential contests across multiple years (2006–2019).  
   - E.g. “Вибори народних депутатів України 26 березня 2006 року [Election of the People’s Deputies … March 26, 2006].” Accessed October 14, 2025. https://www.cvk.gov.ua/pls/vnd2006/W6P001-2.html  
   - Additional CVK pages used: election returns portals for 2007, 2012, 2014, 2019 (parliamentary) and 2010, 2014, 2019 (presidential).  

2. **State Registry of Voters (DRV)**  
   Boundaries and polling-station metadata were extracted primarily from the **“Виборчі округи та дільниці [Electoral Districts and Polling Stations]”** portal (e.g. `pid=9`). Some manual consultation of DRV regional pages was also done to resolve ambiguities.  
   > State Registry of Voters (Ukraine). “Виборчі округи та дільниці [Electoral Districts and Polling Stations].” Accessed October 14, 2025. https://www.drv.gov.ua/portal/cm?pid=9  

3. **#данівиборів (Dani Vyboriv)**  
   Provided a information on how to extract polling station boundaries for each electoral district, which were used to construct the cluster boundaries.
   > '#данівиборів' [Dani Vyboriv]. “Як ми отримали координати усіх виборчих дільниць [How We Obtained the Coordinates of All Polling Stations].” March 30, 2018. Accessed October 14, 2025. https://danivyboriv.net/archives/89  

4. **Ukrainian Center for Social Data**  
   Native-language distribution by settlement, from the 2001 All-Ukrainian Census, mapped onto cluster units.  
   > Ukrainian Center for Social Data. “Розподіл населення за рідною мовою … [Distribution of the Population by Native Language … 2001].” Accessed October 14, 2025. https://socialdata.org.ua/dani_data/lang2001/  

---

## Methodological Notes

- **Cluster construction**: Clusters reconcile mismatches between precinct boundaries and settlement-level census units. Where precincts divide settlement, or settlements divide precincts, clusters aggregate them into the smallest analytically coherent unit.  
- **Data alignment**: Election returns (CVK) are aggregated or disaggregated where necessary to match cluster boundaries. Census language data (2001) is mapped to clusters based on the position of the settlement's centroid points. 
  - Linguistic data is from 2001 and assumes relative stability over time — interpret changes cautiously.  
  - Rural clusters help mitigate ecological inference, but larger settlements limit the data's analytical reach. 
  - Some precincts or settlements were merged or split across election cycles; these were handled by crosswalking. 
  - A robustness check was conducted to check the temporal consistency of clustered election data across cycles with linear regressions of election pairings (2006-2007, 2007-2010, 2010-2012, 2012-2014, 2014-2019). A chart showing the results is available in the appendix. 
  - Equivalency files contain information on the various precinct codes used in each election cycle from 2006 to 2019.

---

## Usage & Citation

Please cite the individual regional datasets as follows:

Shevgaonkar, Dhruv, 2025, "Odesa Region Local-Level Election and Language Data (2001-2019)", https://doi.org/10.7910/DVN/OPARTY, Harvard Dataverse, V1, UNF:6:UitPxK/BKPfoon+nkvixFg== [fileUNF]

---

## License

CC0 1.0

---

## Contact & Updates

For questions, bug reports, or updates, contact dshevgao@ucdavis.edu
This project is currently active. 

---
